Multi-seed Lossless Filtration (Extended Abstract)

نویسندگان

  • Gregory Kucherov
  • Laurent Noé
  • Mikhail A. Roytberg
چکیده

We study a method of seed-based lossless filtration for approximate string matching and related applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and Karkkainen [1]. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spaced Seeds Design Using Perfect Rulers

We consider the problem of lossless spaced seed design for approximate pattern matching. We show that, using mathematical objects known as perfect rulers, we can derive a family of spaced seeds for matching with up to two errors. We analyze these seeds with respect to the trade-off they offer between seed weight and the minimum length of the pattern to be matched. We prove that for patterns of ...

متن کامل

Lossless Seeds for Searching Short Patterns with High Error Rates

We address the problem of approximate pattern matching using the Levenshtein distance. Given a text T and a pattern P , find all locations in T that differ by at most k errors from P . For that purpose, we propose a filtration algorithm that is based on a novel type of seeds, combining exact parts and parts with a fixed number of errors. Experimental tests show that the method is specifically w...

متن کامل

Fast Noise Suppression for Lossless Image Coding ( Extended Preprint )

1 FAST NOISE SUPPRESSION FOR LOSSLESS IMAGE CODING (EXTENDED PREPRINT) Tilo Strutz University of Rostock, Institute of Communications and Information Electronics Richard-Wagner-Str.31, 18119 Rostock, FRG ABSTRACT: This contribution presents a new denoising method for applications requiring preservation of highest visual image quality. The aim of the new approach is not to suppress all noise ins...

متن کامل

Recherche de similarités dans les séquences d'ADN : modèles et algorithmes pour la conception de graines efficaces

Most commonly used similarity search methods in genomic sequences are heuristic ones.These are based upon text ltering that allows one to infer potential regions of similarity. Thisthesis proposes new lter de nitions to search for similarities in genomic sequences, and fastalgorithms to measure the e ciency of these lters.More precisely, we study the spaced seed model and propos...

متن کامل

The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach

Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004